Biostat 212a Homework 1

Due Jan 23, 2024 @ 11:59PM

Author

Li Zhang 206305918

Published

January 23, 2024

install.packages(“reticulate”)

library(reticulate)

pip install pandas

sessionInfo()
R version 4.3.2 (2023-10-31)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.0

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Los_Angeles
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] htmlwidgets_1.6.4 compiler_4.3.2    fastmap_1.1.1     cli_3.6.2        
 [5] tools_4.3.2       htmltools_0.5.7   rstudioapi_0.15.0 yaml_2.3.8       
 [9] rmarkdown_2.25    knitr_1.45        jsonlite_1.8.8    xfun_0.41        
[13] digest_0.6.33     rlang_1.1.2       evaluate_0.23    

1 Filling gaps in lecture notes (10pts)

Consider the regression model \[ Y = f(X) + \epsilon, \] where \(\operatorname{E}(\epsilon) = 0\).

1.1 Optimal regression function

Show that the choice \[ f_{\text{opt}}(X) = \operatorname{E}(Y | X) \] minimizes the mean squared prediction error \[ \operatorname{E}\{[Y - f(X)]^2\}, \] where the expectations averages over variations in both \(X\) and \(Y\). (Hint: condition on \(X\).)

  • Answer:

\[ \begin{align} \operatorname{E}\{[Y - f(X)]^2\}&= \operatorname{E}\{[Y - f_{opt}(X) + f_{opt}(X) -f(X)]^2\}\\ &= \operatorname{E}\{[Y - f_{opt}(X)]^2\} + \operatorname{E}\{[f_{opt}(X) -f(X)]^2\} + 2\operatorname{E}\{[Y - f_{opt}(X)][f_{opt}(X) -f(X)]\} \end{align} \]

And,

\[ 2\operatorname{E}\{[Y - f_{opt}(X)][f_{opt}(X) -f(X)]\} = \operatorname{E}\{\operatorname{E}\{{[Y - f_{opt}(X)][f_{opt}(X) -f(X)]|X\}}\} = 0 \]

So,

\[ \operatorname{E}\{[Y - f(X)]^2\}= \operatorname{E}\{[Y - f_{opt}(X)]^2\} + \operatorname{E}\{[f_{opt}(X) -F(X)]^2\} \] The overall expression is minimized when the first term is minimized. This happens when \(f_{opt}(X) = \operatorname{E}(Y | X)\). Therefore, \(f_{opt}(X) = \operatorname{E}(Y | X)\) minimizes the mean squared prediction error.

1.2 Bias-variance trade-off

Given an estimate \(\hat f\) of \(f\), show that the test error at a \(x_0\) can be decomposed as \[ \operatorname{E}\{[y_0 - \hat f(x_0)]^2\} = \underbrace{\operatorname{Var}(\hat f(x_0)) + [\operatorname{Bias}(\hat f(x_0))]^2}_{\text{MSE of } \hat f(x_0) \text{ for estimating } f(x_0)} + \underbrace{\operatorname{Var}(\epsilon)}_{\text{irreducible}}, \] where the expectation averages over the variability in \(y_0\) and \(\hat f\).

  • Answer:

\[ \begin{align*} \operatorname{E}\{[y_0 - \hat f(x_0)]^2\}&= \operatorname{E}\{[f(x_0) + \epsilon - \hat f(x_0)]^2\}\\&= \operatorname{E}\{f(x_0 - \hat f(x_0)]^2 + \epsilon\}\\ &= \operatorname{E}\{[f(x_0) - \hat f(x_0)]^2\} + 2\operatorname{E}\{f(x_0) - \hat f(x_0)]\epsilon\} + \operatorname{E}\{\epsilon^2\}\\&= \operatorname{E}\{[f(x_0) - \hat f(x_0)]^2\} + 2\operatorname{E}\{f(x_0) - \hat f(x_0)]\epsilon\} + \operatorname{Var}(\epsilon) \end{align*} \]

Because we assume that \(\hat f(x_0)\) and \(\epsilon\) are independent, so we have \(\operatorname{E}\{f(x_0) - \hat f(x_0)]\epsilon\} = 0\). So we have

\[ \begin{align*} \operatorname{E}\{[y_0 - \hat f(x_0)]^2\}&= \operatorname{E}\{[f(x_0) - \hat f(x_0)]^2\} + \operatorname{E}\{\epsilon^2\}\\&= \underbrace{\operatorname{Var}(\hat f(x_0)) + [\operatorname{Bias}(\hat f(x_0))]^2}_{\text{MSE of } \hat f(x_0) \text{ for estimating } f(x_0)} + \underbrace{\operatorname{Var}(\epsilon)}_{\text{irreducible}} \end{align*} \]

2 ISL Exercise 2.4.3 (10pts)

library(ggplot2)

squared_bias <- function(x) 0.002 * (-x + 10)^3
variance <- function(x) 0.002 * x^3
training_error <- function(x) 2.389 - 0.825*x + 0.176*x^2 - 0.0182*x^3 + 0.00067*x^4
test_error <- function(x) 3 - 0.6*x + 0.06*x^2
bayes_error <- function(x) x + 1 - x

x <- seq(0, 10, by = 0.02)
data <- data.frame(x = x, 
                   squared_bias = squared_bias(x), 
                   variance = variance(x), 
                   training_error = training_error(x), 
                   test_error = test_error(x), 
                   bayes_error = bayes_error(x))

ggplot(data, aes(x = x)) +
  geom_line(aes(y = squared_bias, color = "Squared Bias"), linewidth = 1, linetype = "solid", alpha = 0.8) +
  geom_line(aes(y = variance, color = "Variance"), linewidth = 1, linetype = "solid", alpha = 0.8) +
  geom_line(aes(y = training_error, color = "Training Error"), linewidth = 1, linetype = "solid", alpha = 0.8) +
  geom_line(aes(y = test_error, color = "Test Error"), linewidth = 1, linetype = "solid", alpha = 0.8) +
  geom_line(aes(y = bayes_error, color = "Bayes Error"), linewidth = 1, linetype = "solid", alpha = 0.8) +
  labs(title = "Bias-Variance Tradeoff",
       x = "Model Flexibility",
       y = "Values") +
  theme_minimal()

Squared Bias: The discrepancy between the model’s approximation and the true underlying function. As model flexibility increases, a more flexible model becomes increasingly similar to the true function, leading to a diminishing squared bias.

Variance: In the case of a model with minimal flexibility, the variance is zero, as the model fit remains independent of the data. However, as flexibility increases, the variance also increases, capturing the noise in a particular training set. The variance curve is a monotonically increasing function as model flexibility grows.

Training Error:The training error is determined by the average (squared) difference between model predictions and observations. For very inflexible models, this difference can be substantial, but with increasing flexibility (e.g., by fitting higher-degree polynomials), the additional degrees of freedom reduce the average difference, resulting in a decrease in training error.

Bayes Error: This term remains constant since, by definition, it does not depend on X and, consequently, is unaffected by the flexibility of the model.

Test Error: The expected test error is defined as Variance + Bias + Bayes error. The test error exhibits a minimum at an intermediate level of flexibility—neither too flexible, where variance dominates, nor too inflexible, where squared bias is high. The test error plot resembles a somewhat deformed upward parabola: initially high for inflexible models, decreasing as flexibility increases to a minimum, and then increasing as variance starts to dominate. The distance between this minimum and the Bayes irreducible error provides insight into how well the best function in the hypothesis space will fit.

3 ISL Exercise 2.4.4 (10pts)

Classification Applications:

  1. Medical diagnosis. Response: disease present or absent. Predictors: symptoms, test results, patient history, etc. Goal:Inference aiding in diagnosis and treatment planning.

  2. Spam detection. Response: spam or not spam. Predictors: email contents, email sender, etc. Goal: Prediction of spam.

  3. Face recognition. Response: identity of face. Predictors: picture of face, lighting, angle, etc. Goal: Prediction of identity.

Regression Applications:

  1. Cox proportional hazards model. Response: the time until an event occurs (survival time).Predictors: Covariates or features that may influence the hazard rate over time. Goal: Prediction of survival time.

  2. Stock market prediction. Response: price of stock. Predictors: company performance, economic indicators, etc. Goal: Prediction of stock price.

  3. Educational assessment. Response: student’s grade. Predictors: student’s performance on homework, quizzes, etc. Goal: Prediction of student’s grade.

Cluster Analysis Applications:

  1. Market segmentation. Response: market segment. Predictors: customer characteristics, purchasing history, etc. Goal: Identification of distinct groups of customers.

  2. Social network analysis. Response: community. Predictors: social network connections, interests, etc. Goal: Identification of distinct groups of people.

  3. Image segmentation. Response: object. Predictors: pixel color, pixel location, etc. Goal: Identification of distinct objects in an image.

4 ISL Exercise 2.4.10 (30pts)

Your can read in the boston data set directly from url https://raw.githubusercontent.com/ucla-biostat-212a/2024winter/master/slides/data/Boston.csv. A documentation of the boston data set is here.

library(tidyverse)

Boston <- read_csv("https://raw.githubusercontent.com/ucla-biostat-212a/2024winter/master/slides/data/Boston.csv", col_select = -1) %>% 
  print(width = Inf)
# A tibble: 506 × 13
      crim    zn indus  chas   nox    rm   age   dis   rad   tax ptratio lstat
     <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl> <dbl>
 1 0.00632  18    2.31     0 0.538  6.58  65.2  4.09     1   296    15.3  4.98
 2 0.0273    0    7.07     0 0.469  6.42  78.9  4.97     2   242    17.8  9.14
 3 0.0273    0    7.07     0 0.469  7.18  61.1  4.97     2   242    17.8  4.03
 4 0.0324    0    2.18     0 0.458  7.00  45.8  6.06     3   222    18.7  2.94
 5 0.0690    0    2.18     0 0.458  7.15  54.2  6.06     3   222    18.7  5.33
 6 0.0298    0    2.18     0 0.458  6.43  58.7  6.06     3   222    18.7  5.21
 7 0.0883   12.5  7.87     0 0.524  6.01  66.6  5.56     5   311    15.2 12.4 
 8 0.145    12.5  7.87     0 0.524  6.17  96.1  5.95     5   311    15.2 19.2 
 9 0.211    12.5  7.87     0 0.524  5.63 100    6.08     5   311    15.2 29.9 
10 0.170    12.5  7.87     0 0.524  6.00  85.9  6.59     5   311    15.2 17.1 
    medv
   <dbl>
 1  24  
 2  21.6
 3  34.7
 4  33.4
 5  36.2
 6  28.7
 7  22.9
 8  27.1
 9  16.5
10  18.9
# ℹ 496 more rows
import pandas as pd
import io
import requests

url = "https://raw.githubusercontent.com/ucla-econ-425t/2023winter/master/slides/data/Boston.csv"
s = requests.get(url).content
Boston = pd.read_csv(io.StringIO(s.decode('utf-8')), index_col = 0)
Boston
        crim    zn  indus  chas    nox  ...  rad  tax  ptratio  lstat  medv
1    0.00632  18.0   2.31     0  0.538  ...    1  296     15.3   4.98  24.0
2    0.02731   0.0   7.07     0  0.469  ...    2  242     17.8   9.14  21.6
3    0.02729   0.0   7.07     0  0.469  ...    2  242     17.8   4.03  34.7
4    0.03237   0.0   2.18     0  0.458  ...    3  222     18.7   2.94  33.4
5    0.06905   0.0   2.18     0  0.458  ...    3  222     18.7   5.33  36.2
..       ...   ...    ...   ...    ...  ...  ...  ...      ...    ...   ...
502  0.06263   0.0  11.93     0  0.573  ...    1  273     21.0   9.67  22.4
503  0.04527   0.0  11.93     0  0.573  ...    1  273     21.0   9.08  20.6
504  0.06076   0.0  11.93     0  0.573  ...    1  273     21.0   5.64  23.9
505  0.10959   0.0  11.93     0  0.573  ...    1  273     21.0   6.48  22.0
506  0.04741   0.0  11.93     0  0.573  ...    1  273     21.0   7.88  11.9

[506 rows x 13 columns]

4.1 a

library(ISLR2)
cat("Number of rows:", nrow(Boston), "\n")
Number of rows: 506 
cat("Number of columns:", ncol(Boston), "\n")
Number of columns: 13 

Rows: Each row corresponds to a single observation or data point. In this case, each row represents information about a specific suburb in Boston.

Columns: Each column represents a different variable or feature associated with the observations. In this case, each column provides information about a specific aspect of the housing values in these suburbs like ‘crim’(per capita crime rate) by town,‘zn’ (proportion of residential land zoned for lots over 25,000 sq.ft.), etc.

crim: per capita crime rate by town.

zn: proportion of residential land zoned for lots over 25,000 sq.ft.

indus: proportion of non-retail business acres per town.

chas: Charles River dummy variable (= 1 if tract bounds river; 0 otherwise).

nox: nitrogen oxides concentration (parts per 10 million).

rm: average number of rooms per dwelling.

age: proportion of owner-occupied units built prior to 1940.

dis: weighted mean of distances to five Boston employment centres.

rad: index of accessibility to radial highways.

tax: full-value property-tax rate per $10,000.

ptratio: pupil-teacher ratio by town.

lstat: lower status of the population (percent).

medv: median value of owner-occupied homes in $1000s.

4.2 b

library(GGally)

Boston$chas <- as.factor(Boston$chas)

g <- ggpairs(
  data = as.data.frame(Boston), 
  mapping = aes(alpha = 0.25),
  columns = c("crim", "zn", "indus", "chas", "nox", "rm", "age", "dis", "rad", "tax", "ptratio", "lstat", "medv")
) + 
labs(title = "Boston Data")

ggsave("boston.png", plot = g, width = 20, height = 8)

  1. The correlation coefficient between nox and indus is 0.764, statistically significant at the 0.001 level.

    This positive correlation suggests a strong linear relationship, indicating that as the concentration of nitrogen oxides increases, the proportion of non-retail business acres also tends to increase.

    It does not imply causation but this positive correlation may be attributed to factors such as concentration of industrial activities, urban planning, land use, and environmental policies which may need further analysis.

  2. The correlation coefficient between medv and lstat is -0.738, statistically significant at the 0.001 level.

    This negative correlation suggests a strong linear relationship, indicating that as the median value of homes decreases, the lower status of the population tends to increase.

    In other words, areas with higher proportions of lower-status populations tend to have lower median home values.

  3. The correlation coefficient between tax and indus is 0.721, statistically significant at the 0.001 level.

    This positive correlation suggests that towns with a higher proportion of non-retail business acres tend to have higher property-tax rates.

4.3 c

Negative Relationships:

zn: As proportion of residential land zoned for lots over 25,000 sq.ft. increases, per capita crime rate tends to decrease.

rm: An increase in the average number of rooms per dwelling is associated with a decrease in per capita crime rate.

dis: Per capita crime rate decreases as the weighted mean distance to employment centres increases.

medv: Higher median home values are associated with lower per capita crime rates.

Positive Relationships:

indus: An increase in non-retail business acreage is associated with an increase in per capita crime rate.

nox: Higher nitrogen oxides concentration is associated with higher per capita crime rates.

age: Areas with a higher proportion of older buildings tend to have higher per capita crime rates.

rad: Higher accessibility to radial highways is associated with higher per capita crime rates.

tax: Areas with higher property tax rates tend to have higher per capita crime rates.

lstat: An increase in the lower status of the population is associated with higher per capita crime rates.

4.4 d

par(mfrow=c(1,3))
boxplot(Boston$crim, xlab = "crim")
boxplot(Boston$tax, xlab = "tax")
boxplot(Boston$ptratio, xlab = "ptratio")

print(range(Boston$crim))
[1]  0.00632 88.97620
print(range(Boston$tax))
[1] 187 711
print(range(Boston$ptratio))
[1] 12.6 22.0
  • Per capita crime rate by town:

    • Majority of towns have very low crime rates, possibly between zero to five.

    • Some areas exhibit very high crime rates, exceeding 70. Outliers range from 10 to above 80, and many outlier towns do not have extremely high crime rates.

    Overall, the data ranges from 0 to above 80.

  • Full-value property-tax rate per $10,000:

    • No outliers are observed in property tax rates.

    • The median value near 300 suggests skewed data, ranging from 187 to 711.

  • Pupil-teacher ratio by town:

    • Outliers are present in the lower extreme of the box plot.

    • The data ranges from 12.6 to 22.The median value for pupil-teacher ratio is around 19.

4.5 e

table(Boston$chas)

  0   1 
471  35 
  • The table above shows that 35 suburbs bound the Charles River.

4.6 f

median(Boston$ptratio)
[1] 19.05
  • The median pupil-teacher ratio among the towns in this data set is 19.05.

4.7 g

Boston[Boston$medv == min(Boston$medv), ]
# A tibble: 2 × 13
   crim    zn indus chas    nox    rm   age   dis   rad   tax ptratio lstat
  <dbl> <dbl> <dbl> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl> <dbl>
1  38.4     0  18.1 0     0.693  5.45   100  1.49    24   666    20.2  30.6
2  67.9     0  18.1 0     0.693  5.68   100  1.43    24   666    20.2  23.0
# ℹ 1 more variable: medv <dbl>
  • There are two suburbs (399 & 406) that have the lowest median property values.
library(dplyr)
Boston_percentiles <- sapply(Boston[ ,-4], function(x) rank(x)/length(x)) %>%
  as.data.frame()

Boston_percentiles[c(399, 406),]
         crim        zn     indus       nox        rm      age        dis
399 0.9881423 0.3685771 0.7579051 0.8448617 0.0770751 0.958498 0.05731225
406 0.9960474 0.3685771 0.7579051 0.8448617 0.1363636 0.958498 0.04150198
          rad       tax   ptratio     lstat        medv
399 0.8705534 0.8606719 0.7519763 0.9782609 0.002964427
406 0.8705534 0.8606719 0.7519763 0.8992095 0.002964427
  • High Values:

    • crim: Both areas show close to maximum crime rates.

    • indus: Both have a high proportion of non-retail business acres.

  • Low Values:

    • zn: Both areas have a low proportion of residential land zoned for large lots.

    • rm: Both show a low average number of rooms per dwelling.

    • dis: Both exhibit a low mean distance to employment centers.

    • ptratio: Both indicate a low pupil-teacher ratio.

    • medv: Both have a low median value of owner-occupied homes.

  • High Values with Notes:

    • nox, rad : Both show a high concentration. Both areas indicate high accessibility to highways. Possibly near highways.

    • age : Both have a high proportion of older units built before 1940.

4.8 h

  • More than seven rooms per dwelling:
sum(Boston$rm > 7)
[1] 64
  • More than eight rooms per dwelling:
Boston_gt_8rooms <- Boston[Boston$rm > 8, ]
nrow(Boston_gt_8rooms)
[1] 13
prop.table(table(Boston_gt_8rooms$chas))

        0         1 
0.8461538 0.1538462 
  • 15.38% were bound by the Charles River.
summary(Boston)
      crim                zn             indus       chas         nox        
 Min.   : 0.00632   Min.   :  0.00   Min.   : 0.46   0:471   Min.   :0.3850  
 1st Qu.: 0.08205   1st Qu.:  0.00   1st Qu.: 5.19   1: 35   1st Qu.:0.4490  
 Median : 0.25651   Median :  0.00   Median : 9.69           Median :0.5380  
 Mean   : 3.61352   Mean   : 11.36   Mean   :11.14           Mean   :0.5547  
 3rd Qu.: 3.67708   3rd Qu.: 12.50   3rd Qu.:18.10           3rd Qu.:0.6240  
 Max.   :88.97620   Max.   :100.00   Max.   :27.74           Max.   :0.8710  
       rm             age              dis              rad        
 Min.   :3.561   Min.   :  2.90   Min.   : 1.130   Min.   : 1.000  
 1st Qu.:5.886   1st Qu.: 45.02   1st Qu.: 2.100   1st Qu.: 4.000  
 Median :6.208   Median : 77.50   Median : 3.207   Median : 5.000  
 Mean   :6.285   Mean   : 68.57   Mean   : 3.795   Mean   : 9.549  
 3rd Qu.:6.623   3rd Qu.: 94.08   3rd Qu.: 5.188   3rd Qu.:24.000  
 Max.   :8.780   Max.   :100.00   Max.   :12.127   Max.   :24.000  
      tax           ptratio          lstat            medv      
 Min.   :187.0   Min.   :12.60   Min.   : 1.73   Min.   : 5.00  
 1st Qu.:279.0   1st Qu.:17.40   1st Qu.: 6.95   1st Qu.:17.02  
 Median :330.0   Median :19.05   Median :11.36   Median :21.20  
 Mean   :408.2   Mean   :18.46   Mean   :12.65   Mean   :22.53  
 3rd Qu.:666.0   3rd Qu.:20.20   3rd Qu.:16.95   3rd Qu.:25.00  
 Max.   :711.0   Max.   :22.00   Max.   :37.97   Max.   :50.00  
summary(Boston_gt_8rooms)
      crim               zn            indus        chas        nox        
 Min.   :0.02009   Min.   : 0.00   Min.   : 2.680   0:11   Min.   :0.4161  
 1st Qu.:0.33147   1st Qu.: 0.00   1st Qu.: 3.970   1: 2   1st Qu.:0.5040  
 Median :0.52014   Median : 0.00   Median : 6.200          Median :0.5070  
 Mean   :0.71879   Mean   :13.62   Mean   : 7.078          Mean   :0.5392  
 3rd Qu.:0.57834   3rd Qu.:20.00   3rd Qu.: 6.200          3rd Qu.:0.6050  
 Max.   :3.47428   Max.   :95.00   Max.   :19.580          Max.   :0.7180  
       rm             age             dis             rad        
 Min.   :8.034   Min.   : 8.40   Min.   :1.801   Min.   : 2.000  
 1st Qu.:8.247   1st Qu.:70.40   1st Qu.:2.288   1st Qu.: 5.000  
 Median :8.297   Median :78.30   Median :2.894   Median : 7.000  
 Mean   :8.349   Mean   :71.54   Mean   :3.430   Mean   : 7.462  
 3rd Qu.:8.398   3rd Qu.:86.50   3rd Qu.:3.652   3rd Qu.: 8.000  
 Max.   :8.780   Max.   :93.90   Max.   :8.907   Max.   :24.000  
      tax           ptratio          lstat           medv     
 Min.   :224.0   Min.   :13.00   Min.   :2.47   Min.   :21.9  
 1st Qu.:264.0   1st Qu.:14.70   1st Qu.:3.32   1st Qu.:41.7  
 Median :307.0   Median :17.40   Median :4.14   Median :48.3  
 Mean   :325.1   Mean   :16.36   Mean   :4.31   Mean   :44.2  
 3rd Qu.:307.0   3rd Qu.:17.40   3rd Qu.:5.12   3rd Qu.:50.0  
 Max.   :666.0   Max.   :20.20   Max.   :7.44   Max.   :50.0  

These findings suggest that census tracts with more than eight rooms per dwelling generally have favorable indicators such as low crime rates, high residential land proportions, low industrial presence, proximity to the Charles River, low nitrogen oxides concentration, spacious dwellings, newer units, moderate accessibility, moderate tax rates, low pupil-teacher ratios, low lower status percentages, and higher median home values.

5 ISL Exercise 3.7.3 (12pts)

5.1 a

Only ii is correct.

\(\hat{\beta_3}\)=35 means that college graduates have a starting salary that is $35,000 higher than high school graduates on average.

There are no interaction terms involving Gender, so the effect of Gender does not depend on the values of GPA or IQ.

The effect of 35 is unconditional, for any fixed values of the other predictors GPA and IQ.

5.2 b

\[ Salary=\hat{\beta_0}+\hat{\beta_1}*GPA+\hat{\beta_2}*IQ+\hat{\beta_3}*Level+\hat{\beta_4}*(GPQ*IQ)+\hat{\beta_5}*(GPQ*Level) \] Substitute the given values: \[ Salary=50+20*4.0+0.07*110+35+0.01*(4.0*110)-10*(4.0*1)=137.1 \]

5.3 c

False. \(\hat{\beta_4}\) is small and it means a small interaction effect. However, to assess statistical significance, we would typically look at the p-value associated with \(\hat{\beta_4}\), rather than its magnitude.If the p-value is small (usually below a significance level like 0.05), it provides evidence against the null hypothesis that the interaction effect is zero.

6 ISL Exercise 3.7.15 (20pts)

6.1 a

Simple Linear Regression Models:

lm.zn = lm(crim~zn, data=Boston)
lm.indus = lm(crim~indus, data=Boston)
lm.chas = lm(crim~chas, data=Boston)
lm.nox = lm(crim~nox, data=Boston)
lm.rm = lm(crim~rm, data=Boston)
lm.age = lm(crim~age, data=Boston)
lm.dis = lm(crim~dis, data=Boston)
lm.rad = lm(crim~rad, data=Boston)
lm.tax = lm(crim~tax, data=Boston)
lm.ptratio = lm(crim~ptratio, data=Boston)
lm.lstat = lm(crim~lstat, data=Boston)
lm.medv = lm(crim~medv, data=Boston)
summary(lm.zn)

Call:
lm(formula = crim ~ zn, data = Boston)

Residuals:
   Min     1Q Median     3Q    Max 
-4.429 -4.222 -2.620  1.250 84.523 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  4.45369    0.41722  10.675  < 2e-16 ***
zn          -0.07393    0.01609  -4.594 5.51e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 8.435 on 504 degrees of freedom
Multiple R-squared:  0.04019,   Adjusted R-squared:  0.03828 
F-statistic:  21.1 on 1 and 504 DF,  p-value: 5.506e-06
summary(lm.indus)

Call:
lm(formula = crim ~ indus, data = Boston)

Residuals:
    Min      1Q  Median      3Q     Max 
-11.972  -2.698  -0.736   0.712  81.813 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -2.06374    0.66723  -3.093  0.00209 ** 
indus        0.50978    0.05102   9.991  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.866 on 504 degrees of freedom
Multiple R-squared:  0.1653,    Adjusted R-squared:  0.1637 
F-statistic: 99.82 on 1 and 504 DF,  p-value: < 2.2e-16
summary(lm.chas)

Call:
lm(formula = crim ~ chas, data = Boston)

Residuals:
   Min     1Q Median     3Q    Max 
-3.738 -3.661 -3.435  0.018 85.232 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   3.7444     0.3961   9.453   <2e-16 ***
chas1        -1.8928     1.5061  -1.257    0.209    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 8.597 on 504 degrees of freedom
Multiple R-squared:  0.003124,  Adjusted R-squared:  0.001146 
F-statistic: 1.579 on 1 and 504 DF,  p-value: 0.2094
summary(lm.nox)

Call:
lm(formula = crim ~ nox, data = Boston)

Residuals:
    Min      1Q  Median      3Q     Max 
-12.371  -2.738  -0.974   0.559  81.728 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  -13.720      1.699  -8.073 5.08e-15 ***
nox           31.249      2.999  10.419  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.81 on 504 degrees of freedom
Multiple R-squared:  0.1772,    Adjusted R-squared:  0.1756 
F-statistic: 108.6 on 1 and 504 DF,  p-value: < 2.2e-16
summary(lm.rm)

Call:
lm(formula = crim ~ rm, data = Boston)

Residuals:
   Min     1Q Median     3Q    Max 
-6.604 -3.952 -2.654  0.989 87.197 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   20.482      3.365   6.088 2.27e-09 ***
rm            -2.684      0.532  -5.045 6.35e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 8.401 on 504 degrees of freedom
Multiple R-squared:  0.04807,   Adjusted R-squared:  0.04618 
F-statistic: 25.45 on 1 and 504 DF,  p-value: 6.347e-07
summary(lm.age)

Call:
lm(formula = crim ~ age, data = Boston)

Residuals:
   Min     1Q Median     3Q    Max 
-6.789 -4.257 -1.230  1.527 82.849 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -3.77791    0.94398  -4.002 7.22e-05 ***
age          0.10779    0.01274   8.463 2.85e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 8.057 on 504 degrees of freedom
Multiple R-squared:  0.1244,    Adjusted R-squared:  0.1227 
F-statistic: 71.62 on 1 and 504 DF,  p-value: 2.855e-16
summary(lm.dis)

Call:
lm(formula = crim ~ dis, data = Boston)

Residuals:
   Min     1Q Median     3Q    Max 
-6.708 -4.134 -1.527  1.516 81.674 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   9.4993     0.7304  13.006   <2e-16 ***
dis          -1.5509     0.1683  -9.213   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.965 on 504 degrees of freedom
Multiple R-squared:  0.1441,    Adjusted R-squared:  0.1425 
F-statistic: 84.89 on 1 and 504 DF,  p-value: < 2.2e-16
summary(lm.rad)

Call:
lm(formula = crim ~ rad, data = Boston)

Residuals:
    Min      1Q  Median      3Q     Max 
-10.164  -1.381  -0.141   0.660  76.433 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -2.28716    0.44348  -5.157 3.61e-07 ***
rad          0.61791    0.03433  17.998  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 6.718 on 504 degrees of freedom
Multiple R-squared:  0.3913,    Adjusted R-squared:   0.39 
F-statistic: 323.9 on 1 and 504 DF,  p-value: < 2.2e-16
summary(lm.tax)

Call:
lm(formula = crim ~ tax, data = Boston)

Residuals:
    Min      1Q  Median      3Q     Max 
-12.513  -2.738  -0.194   1.065  77.696 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -8.528369   0.815809  -10.45   <2e-16 ***
tax          0.029742   0.001847   16.10   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 6.997 on 504 degrees of freedom
Multiple R-squared:  0.3396,    Adjusted R-squared:  0.3383 
F-statistic: 259.2 on 1 and 504 DF,  p-value: < 2.2e-16
summary(lm.ptratio)

Call:
lm(formula = crim ~ ptratio, data = Boston)

Residuals:
   Min     1Q Median     3Q    Max 
-7.654 -3.985 -1.912  1.825 83.353 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -17.6469     3.1473  -5.607 3.40e-08 ***
ptratio       1.1520     0.1694   6.801 2.94e-11 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 8.24 on 504 degrees of freedom
Multiple R-squared:  0.08407,   Adjusted R-squared:  0.08225 
F-statistic: 46.26 on 1 and 504 DF,  p-value: 2.943e-11
summary(lm.lstat)

Call:
lm(formula = crim ~ lstat, data = Boston)

Residuals:
    Min      1Q  Median      3Q     Max 
-13.925  -2.822  -0.664   1.079  82.862 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -3.33054    0.69376  -4.801 2.09e-06 ***
lstat        0.54880    0.04776  11.491  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.664 on 504 degrees of freedom
Multiple R-squared:  0.2076,    Adjusted R-squared:  0.206 
F-statistic:   132 on 1 and 504 DF,  p-value: < 2.2e-16
summary(lm.medv)

Call:
lm(formula = crim ~ medv, data = Boston)

Residuals:
   Min     1Q Median     3Q    Max 
-9.071 -4.022 -2.343  1.298 80.957 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 11.79654    0.93419   12.63   <2e-16 ***
medv        -0.36316    0.03839   -9.46   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.934 on 504 degrees of freedom
Multiple R-squared:  0.1508,    Adjusted R-squared:  0.1491 
F-statistic: 89.49 on 1 and 504 DF,  p-value: < 2.2e-16

Significant associations were found between the crime rate (crim) and the following variables in the regression models: zn, indus, nox, rm, age, dis, rad, tax, ptratio, lstat, and medv.

However, there was no statistically significant association between crime rate and the chas variable.

6.2 b

Multiple Linear Regression Models:

model_multiple <- lm(crim ~ ., data = Boston)
summary(model_multiple)

Call:
lm(formula = crim ~ ., data = Boston)

Residuals:
   Min     1Q Median     3Q    Max 
-8.534 -2.248 -0.348  1.087 73.923 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) 13.7783938  7.0818258   1.946 0.052271 .  
zn           0.0457100  0.0187903   2.433 0.015344 *  
indus       -0.0583501  0.0836351  -0.698 0.485709    
chas1       -0.8253776  1.1833963  -0.697 0.485841    
nox         -9.9575865  5.2898242  -1.882 0.060370 .  
rm           0.6289107  0.6070924   1.036 0.300738    
age         -0.0008483  0.0179482  -0.047 0.962323    
dis         -1.0122467  0.2824676  -3.584 0.000373 ***
rad          0.6124653  0.0875358   6.997 8.59e-12 ***
tax         -0.0037756  0.0051723  -0.730 0.465757    
ptratio     -0.3040728  0.1863598  -1.632 0.103393    
lstat        0.1388006  0.0757213   1.833 0.067398 .  
medv        -0.2200564  0.0598240  -3.678 0.000261 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 6.46 on 493 degrees of freedom
Multiple R-squared:  0.4493,    Adjusted R-squared:  0.4359 
F-statistic: 33.52 on 12 and 493 DF,  p-value: < 2.2e-16

For the predictors zn, dis, rad, and medv, we can reject the null hypothesis as their p-values are less than 0.05.

6.3 c

univariate_coefficients <- sapply(Boston[, -1], function(x) lm(crim ~ x, data = Boston)$coefficients[2])
multiple_coefficients <- coef(lm(crim ~ ., data = Boston))
coefficients_df <- data.frame(Univariate = univariate_coefficients, Multiple = multiple_coefficients[-1], Predictor = colnames(Boston)[-1])

library(ggplot2)
ggplot(coefficients_df, aes(x = Univariate, y = Multiple, label = Predictor)) +
  geom_point(position = position_jitter(width = 0.2, height = 0.1), size = 3, color = "blue", alpha = 0.7) +
  geom_text(hjust = 0, vjust = 0, size = 4) +
  labs(title = "Comparison of Univariate and Multiple Regression Coefficients",
       x = "Univariate Coefficients", y = "Multiple Coefficients") 

6.4 d

lm_zn <- lm(crim ~ poly(zn, 3), data = Boston)
summary(lm_zn) # 1,2 orders are siginificant

Call:
lm(formula = crim ~ poly(zn, 3), data = Boston)

Residuals:
   Min     1Q Median     3Q    Max 
-4.821 -4.614 -1.294  0.473 84.130 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)    3.6135     0.3722   9.709  < 2e-16 ***
poly(zn, 3)1 -38.7498     8.3722  -4.628  4.7e-06 ***
poly(zn, 3)2  23.9398     8.3722   2.859  0.00442 ** 
poly(zn, 3)3 -10.0719     8.3722  -1.203  0.22954    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 8.372 on 502 degrees of freedom
Multiple R-squared:  0.05824,   Adjusted R-squared:  0.05261 
F-statistic: 10.35 on 3 and 502 DF,  p-value: 1.281e-06
lm_indus <- lm(crim ~ poly(indus, 3), data = Boston)
summary(lm_indus) # 1,2,3 orders are siginificant

Call:
lm(formula = crim ~ poly(indus, 3), data = Boston)

Residuals:
   Min     1Q Median     3Q    Max 
-8.278 -2.514  0.054  0.764 79.713 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)        3.614      0.330  10.950  < 2e-16 ***
poly(indus, 3)1   78.591      7.423  10.587  < 2e-16 ***
poly(indus, 3)2  -24.395      7.423  -3.286  0.00109 ** 
poly(indus, 3)3  -54.130      7.423  -7.292  1.2e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.423 on 502 degrees of freedom
Multiple R-squared:  0.2597,    Adjusted R-squared:  0.2552 
F-statistic: 58.69 on 3 and 502 DF,  p-value: < 2.2e-16
# lm.chas = lm(crim~poly(chas,3)) : qualitative predictor

lm_nox <- lm(crim ~ poly(nox, 3), data = Boston)
summary(lm_nox) # 1,2,3 orders are siginificant

Call:
lm(formula = crim ~ poly(nox, 3), data = Boston)

Residuals:
   Min     1Q Median     3Q    Max 
-9.110 -2.068 -0.255  0.739 78.302 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)     3.6135     0.3216  11.237  < 2e-16 ***
poly(nox, 3)1  81.3720     7.2336  11.249  < 2e-16 ***
poly(nox, 3)2 -28.8286     7.2336  -3.985 7.74e-05 ***
poly(nox, 3)3 -60.3619     7.2336  -8.345 6.96e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.234 on 502 degrees of freedom
Multiple R-squared:  0.297, Adjusted R-squared:  0.2928 
F-statistic: 70.69 on 3 and 502 DF,  p-value: < 2.2e-16
lm_rm <- lm(crim ~ poly(rm, 3), data = Boston)
summary(lm_rm) # 1,2 orders are siginificant

Call:
lm(formula = crim ~ poly(rm, 3), data = Boston)

Residuals:
    Min      1Q  Median      3Q     Max 
-18.485  -3.468  -2.221  -0.015  87.219 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)    3.6135     0.3703   9.758  < 2e-16 ***
poly(rm, 3)1 -42.3794     8.3297  -5.088 5.13e-07 ***
poly(rm, 3)2  26.5768     8.3297   3.191  0.00151 ** 
poly(rm, 3)3  -5.5103     8.3297  -0.662  0.50858    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 8.33 on 502 degrees of freedom
Multiple R-squared:  0.06779,   Adjusted R-squared:  0.06222 
F-statistic: 12.17 on 3 and 502 DF,  p-value: 1.067e-07
lm_age <- lm(crim ~ poly(age, 3), data = Boston)
summary(lm_age) # 1,2,3 orders are siginificant

Call:
lm(formula = crim ~ poly(age, 3), data = Boston)

Residuals:
   Min     1Q Median     3Q    Max 
-9.762 -2.673 -0.516  0.019 82.842 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)     3.6135     0.3485  10.368  < 2e-16 ***
poly(age, 3)1  68.1820     7.8397   8.697  < 2e-16 ***
poly(age, 3)2  37.4845     7.8397   4.781 2.29e-06 ***
poly(age, 3)3  21.3532     7.8397   2.724  0.00668 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.84 on 502 degrees of freedom
Multiple R-squared:  0.1742,    Adjusted R-squared:  0.1693 
F-statistic: 35.31 on 3 and 502 DF,  p-value: < 2.2e-16
lm_dis <- lm(crim ~ poly(dis, 3), data = Boston)
summary(lm_dis) # 1,2,3 orders are siginificant

Call:
lm(formula = crim ~ poly(dis, 3), data = Boston)

Residuals:
    Min      1Q  Median      3Q     Max 
-10.757  -2.588   0.031   1.267  76.378 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)     3.6135     0.3259  11.087  < 2e-16 ***
poly(dis, 3)1 -73.3886     7.3315 -10.010  < 2e-16 ***
poly(dis, 3)2  56.3730     7.3315   7.689 7.87e-14 ***
poly(dis, 3)3 -42.6219     7.3315  -5.814 1.09e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.331 on 502 degrees of freedom
Multiple R-squared:  0.2778,    Adjusted R-squared:  0.2735 
F-statistic: 64.37 on 3 and 502 DF,  p-value: < 2.2e-16
lm_rad <- lm(crim ~ poly(rad, 3), data = Boston)
summary(lm_rad) # 1,2 orders are siginificant

Call:
lm(formula = crim ~ poly(rad, 3), data = Boston)

Residuals:
    Min      1Q  Median      3Q     Max 
-10.381  -0.412  -0.269   0.179  76.217 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)     3.6135     0.2971  12.164  < 2e-16 ***
poly(rad, 3)1 120.9074     6.6824  18.093  < 2e-16 ***
poly(rad, 3)2  17.4923     6.6824   2.618  0.00912 ** 
poly(rad, 3)3   4.6985     6.6824   0.703  0.48231    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 6.682 on 502 degrees of freedom
Multiple R-squared:    0.4, Adjusted R-squared:  0.3965 
F-statistic: 111.6 on 3 and 502 DF,  p-value: < 2.2e-16
lm_tax <- lm(crim ~ poly(tax, 3), data = Boston)
summary(lm_tax) # 1,2 orders are siginificant

Call:
lm(formula = crim ~ poly(tax, 3), data = Boston)

Residuals:
    Min      1Q  Median      3Q     Max 
-13.273  -1.389   0.046   0.536  76.950 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)     3.6135     0.3047  11.860  < 2e-16 ***
poly(tax, 3)1 112.6458     6.8537  16.436  < 2e-16 ***
poly(tax, 3)2  32.0873     6.8537   4.682 3.67e-06 ***
poly(tax, 3)3  -7.9968     6.8537  -1.167    0.244    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 6.854 on 502 degrees of freedom
Multiple R-squared:  0.3689,    Adjusted R-squared:  0.3651 
F-statistic:  97.8 on 3 and 502 DF,  p-value: < 2.2e-16
lm_ptratio <- lm(crim ~ poly(ptratio, 3), data = Boston)
summary(lm_ptratio) # 1,2,3 orders are siginificant

Call:
lm(formula = crim ~ poly(ptratio, 3), data = Boston)

Residuals:
   Min     1Q Median     3Q    Max 
-6.833 -4.146 -1.655  1.408 82.697 

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
(Intercept)          3.614      0.361  10.008  < 2e-16 ***
poly(ptratio, 3)1   56.045      8.122   6.901 1.57e-11 ***
poly(ptratio, 3)2   24.775      8.122   3.050  0.00241 ** 
poly(ptratio, 3)3  -22.280      8.122  -2.743  0.00630 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 8.122 on 502 degrees of freedom
Multiple R-squared:  0.1138,    Adjusted R-squared:  0.1085 
F-statistic: 21.48 on 3 and 502 DF,  p-value: 4.171e-13
lm_lstat <- lm(crim ~ poly(lstat, 3), data = Boston)
summary(lm_lstat) # 1,2 orders are siginificant

Call:
lm(formula = crim ~ poly(lstat, 3), data = Boston)

Residuals:
    Min      1Q  Median      3Q     Max 
-15.234  -2.151  -0.486   0.066  83.353 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)       3.6135     0.3392  10.654   <2e-16 ***
poly(lstat, 3)1  88.0697     7.6294  11.543   <2e-16 ***
poly(lstat, 3)2  15.8882     7.6294   2.082   0.0378 *  
poly(lstat, 3)3 -11.5740     7.6294  -1.517   0.1299    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.629 on 502 degrees of freedom
Multiple R-squared:  0.2179,    Adjusted R-squared:  0.2133 
F-statistic: 46.63 on 3 and 502 DF,  p-value: < 2.2e-16
lm_medv <- lm(crim ~ poly(medv, 3), data = Boston)
summary(lm_medv) # 1,2,3 orders are siginificant

Call:
lm(formula = crim ~ poly(medv, 3), data = Boston)

Residuals:
    Min      1Q  Median      3Q     Max 
-24.427  -1.976  -0.437   0.439  73.655 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)       3.614      0.292  12.374  < 2e-16 ***
poly(medv, 3)1  -75.058      6.569 -11.426  < 2e-16 ***
poly(medv, 3)2   88.086      6.569  13.409  < 2e-16 ***
poly(medv, 3)3  -48.033      6.569  -7.312 1.05e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 6.569 on 502 degrees of freedom
Multiple R-squared:  0.4202,    Adjusted R-squared:  0.4167 
F-statistic: 121.3 on 3 and 502 DF,  p-value: < 2.2e-16

Answer: Yes for most, except for chas. See above inline comments.

7 Bonus question (20pts)

For multiple linear regression, show that \(R^2\) is equal to the correlation between the response vector \(\mathbf{y} = (y_1, \ldots, y_n)^T\) and the fitted values \(\hat{\mathbf{y}} = (\hat y_1, \ldots, \hat y_n)^T\). That is \[ R^2 = 1 - \frac{\text{RSS}}{\text{TSS}} = [\operatorname{Cor}(\mathbf{y}, \hat{\mathbf{y}})]^2. \]

Answer: Recall that the coefficient of determination is defined as: \[R^2 = 1 - \frac{\text{RSS}}{\text{TSS}}\] Where RSS is the residual sum of squares and TSS is the total sum of squares.

The total sum of squares is defined as: \[\text{TSS} = \sum_{i=1}^n (y_i - \bar{y})^2\] Where \(\bar{y} = \frac{1}{n}\sum_{i=1}^n y_i\) is the mean of the response values.

The residual sum of squares is defined as: \[\text{RSS} = \sum_{i=1}^n (y_i - \hat{y}_i)^2\] Where \(\hat{y}_i\) is the fitted values from the regression.

Now the correlation between the response vector \(\mathbf{y}\) and fitted values \(\hat{\mathbf{y}}\) is defined as: \[\operatorname{Cor}(\mathbf{y}, \hat{\mathbf{y}}) = \frac{\sum_{i=1}^n (y_i - \bar{y})(\hat{y}_i - \bar{\hat{y}})}{\sqrt{\sum_{i=1}^n(y_i - \bar{y})^2\sum_{i=1}^n(\hat{y}_i - \bar{\hat{y}})^2}}\]

Note that \(\bar{\hat{y}} = \frac{1}{n}\sum_{i=1}^n \hat{y}_i = \bar{y}\) since the fitted values \(\hat{y}_i\) have the same mean as the response values \(y_i\).

Therefore, the correlation simplifies to: \[\operatorname{Cor}(\mathbf{y}, \hat{\mathbf{y}}) = \frac{\sum_{i=1}^n (y_i - \bar{y})(\hat{y}_i - \bar{y})}{\sqrt{TSS \cdot RSS}}\]

Squaring both sides gives: \[[\operatorname{Cor}(\mathbf{y}, \hat{\mathbf{y}})]^2 = \frac{(\sum_{i=1}^n (y_i - \bar{y})(\hat{y}_i - \bar{y}))^2}{TSS \cdot RSS}\]

But the numerator is just TSS - RSS, so: \[[\operatorname{Cor}(\mathbf{y}, \hat{\mathbf{y}})]^2 = \frac{TSS - RSS}{TSS} = 1 - \frac{RSS}{TSS} = R^2\]

Therefore, we have shown that \(R^2 = [\operatorname{Cor}(\mathbf{y}, \hat{\mathbf{y}})]^2\).